72 research outputs found

    Speeding up without loss of accuracy::Item position effects on performance in university exams

    Get PDF
    The quality of exams drives test-taking behavior of exam- inees and is a proxy for the quality of teaching. As most university exams have strict time limits, and speededness is an important measure of the cognitive state of examinees, this might be used to assess the connection between exams’ quality and examinees’ performance. The practice of randomization within university exams enables the analysis of item position effects within individual exams as a measure of speededness, and as such it enables the creation of a measure of the quality of an exam. In this research, we use generalized linear mixed models to evaluate item position effects on response accuracy and response time in a large dataset of randomized exams from Utrecht University. We find that there is an effect of item position on response time for most exams, but the same is not true for response accuracy, which might be a starting point for identifying factors that influence speededness and can affect the mental state of examinees

    VAST: a practical validation framework for e-assessment solutions

    Get PDF
    The influx of technology in education has made it increasingly difficult to assess the validity of educational assessments. The field of information systems often ignores the social dimension during validation, whereas educational research neglects the technical dimensions of designed instruments. The inseparability of social and technical elements forms the bedrock of socio-technical systems. Therefore, the current lack of validation approaches that address both dimensions is a significant gap. We address this gap by introducing VAST: a validation framework for e-assessment solutions. Examples of such solutions are technology-enhanced learning systems and e-health applications. Using multi-grounded action research as our methodology, we investigate how we can synthesise existing knowledge from information systems and educational measurement to construct our validation framework. We develop an extensive user guideline complementing our framework and find through expert interviews that VAST facilitates a comprehensive, practical approach to validating e-assessment solutions

    Urnings:A new method for tracking dynamically changing parameters in paired comparison systems

    Get PDF
    We introduce a new rating system for tracking the development of parameters based on a stream of observations that can be viewed as paired comparisons. Rating systems are applied in competitive games, adaptive learning systems and platforms for product and service reviews. We model each observation as an outcome of a game of chance that depends on the parameters of interest (e.g. the outcome of a chess game depends on the abilities of the two players). Determining the probabilities of the different game outcomes is conceptualized as an urn problem, where a rating is represented by a probability (i.e. proportion of balls in the urn). This setup allows for evaluating the standard errors of the ratings and performing statistical inferences about the development of, and relations between, parameters. Theoretical properties of the system in terms of the invariant distributions of the ratings and their convergence are derived. The properties of the rating system are illustrated with simulated examples and its potential for answering research questions is illustrated using data from competitive chess, a movie review system, and an adaptive learning system for math

    Embracing Trustworthiness and Authenticity in the Validation of Learning Analytics Systems

    Get PDF
    Learning analytics sits in the middle space between learning theory and data analytics. The inherent diversity of learning analytics manifests itself in an epistemology that strikes a balance between positivism and interpretivism, and knowledge that is sourced from theory and practice. In this paper, we argue that validation approaches for learning analytics systems should be cognisant of these diverse foundations. Through a systematic review of learning analytics validation research, we find that there is currently an over-reliance on positivistic validity criteria. Researchers tend to ignore interpretivistic criteria such as trustworthiness and authenticity. In the 38 papers we analysed, researchers covered positivistic validity criteria 221 times, whereas interpretivistic criteria were mentioned 37 times. We motivate that learning analytics can only move forward with holistic validation strategies that incorporate “thick descriptions” of educational experiences. We conclude by outlining a planned validation study using argument-based validation, which we believe will yield meaningful insights by considering a diverse spectrum of validity criteria

    Tracking a multitude of abilities as they develop

    Get PDF
    Recently, the Urnings algorithm (Bolsinova et al., 2022, J. R. Stat. Soc. Ser. C Appl. Statistics, 71, 91) has been proposed that allows for tracking the development of abilities of the learners and the difficulties of the items in adaptive learning systems. It is a simple and scalable algorithm which is suited for large-scale applications in which large streams of data are coming into the system and on-the-fly updating is needed. Compared to alternatives like the Elo rating system and its extensions, the Urnings rating system allows the uncertainty of the ratings to be evaluated and accounts for adaptive item selection which, if not corrected for, may distort the ratings. In this paper we extend the Urnings algorithm to allow for both between-item and within-item multidimensionality. This allows for tracking the development of interrelated abilities both at the individual and the population level. We present formal derivations of the multidimensional Urnings algorithm, illustrate its properties in simulations, and present an application to data from an adaptive learning system for primary school mathematics called Math Garden

    All That Glitters Is Not Gold: Towards Process Discovery Techniques with Guarantees

    Get PDF
    The aim of a process discovery algorithm is to construct from event data a process model that describes the underlying, real-world process well. Intuitively, the better the quality of the event data, the better the quality of the model that is discovered. However, existing process discovery algorithms do not guarantee this relationship. We demonstrate this by using a range of quality measures for both event data and discovered process models. This paper is a call to the community of IS engineers to complement their process discovery algorithms with properties that relate qualities of their inputs to those of their outputs. To this end, we distinguish four incremental stages for the development of such algorithms, along with concrete guidelines for the formulation of relevant properties and experimental validation. We will also use these stages to reflect on the state of the art, which shows the need to move forward in our thinking about algorithmic process discovery.Comment: 13 pages, 4 figures. Submitted to the International Conference on Advanced Information Systems Engineering, 202

    Readability Metrics for Machine Translation in Dutch: Google vs. Azure & IBM

    Get PDF
    This paper introduces a novel method to predict when a Google translation is better than other machine translations (MT) in Dutch. Instead of considering fidelity, this approach considers fluency and readability indicators for when Google ranked best. This research explores an alternative approach in the field of quality estimation. The paper contributes by publishing a dataset with sentences from English to Dutch, with human-made classifications on a best-worst scale. Logistic regression shows a correlation between T-Scan output, such as readability measurements like lemma frequencies, and when Google translation was better than Azure and IBM. The last part of the results section shows the prediction possibilities. First by logistic regression and second by a generated automated machine learning model. Respectively, they have an accuracy of 0.59 and 0.61

    Exploring the Utility of Dutch Question Answering Datasets for Human Resource Contact Centres

    Get PDF
    We explore the use case of question answering (QA) by a contact centre for 130,000 Dutch government employees in the domain of questions about human resources (HR). HR questions can be answered using personnel files or general documentation, with the latter being the focus of the current research. We created a Dutch HR QA dataset with over 300 questions in the format of the Squad 2.0 dataset, which distinguishes between answerable and unanswerable questions. We applied various BERT-based models, either directly or after finetuning on the new dataset. The F1-scores reached 0.47 for unanswerable questions and 1.0 for answerable questions depending on the topic; however, large variations in scores were observed. We conclude more data are needed to further improve the performance of this task

    Uncovering the structures of privacy research using bibliometric network analysis and topic modelling

    Get PDF
    Purpose This paper aims that privacy research is divided in distinct communities and rarely considered as a singular field, harming its disciplinary identity. The authors collected 119.810 publications and over 3 million references to perform a bibliometric domain analysis as a quantitative approach to uncover the structures within the privacy research field. Design/methodology/approach The bibliometric domain analysis consists of a combined directed network and topic model of published privacy research. The network contains 83,159 publications and 462,633 internal references. A Latent Dirichlet allocation (LDA) topic model from the same dataset offers an additional lens on structure by classifying each publication on 36 topics with the network data. The combined outcomes of these methods are used to investigate the structural position and topical make-up of the privacy research communities. Findings The authors identified the research communities as well as categorised their structural positioning. Four communities form the core of privacy research: individual privacy and law, cloud computing, location data and privacy-preserving data publishing. The latter is a macro-community of data mining, anonymity metrics and differential privacy. Surrounding the core are applied communities. Further removed are communities with little influence, most notably the medical communities that make up 14.4% of the network. The topic model shows system design as a potentially latent community. Noteworthy is the absence of a centralised body of knowledge on organisational privacy management. Originality/value This is the first in-depth, quantitative mapping study of all privacy research

    All that glitters is not gold: Four maturity stages of process discovery algorithms

    Get PDF
    A process discovery algorithm aims to construct a process model that represents the real-world process stored in event data well; it is precise, generalizes the data correctly, and is simple. At the same time, it is reasonable to expect that better quality input event data should lead to constructed process models of better quality. However, existing process discovery algorithms omit the discussion of this relationship between the inputs and outputs and, as it turns out, often do not guarantee it. We demonstrate the latter claim using several quality measures for event data and discovered process models. Consequently, this paper requests for more rigor in the design of process discovery algorithms, including properties that relate the qualities of the inputs and outputs of these algorithms. We present four incremental maturity stages for process discovery algorithms, along with concrete guidelines for formulating relevant properties and experimental validation. We then use these stages to review several state of the art process discovery algorithms to confirm the need to reflect on how we perform algorithmic process discovery
    • …
    corecore